An integrated genomic approach identifies follistatin as a target of the p63-epidermal growth factor receptor oncogenic network in head and neck squamous cell carcinoma

Abstract Although numerous putative oncogenes have been associated with the etiology of head and neck squamous cell carcinoma (HNSCC), the mechanisms by which these oncogenes and their downstream targets mediate tumor progression have not been fully elucidated. We performed an integrative analysis to identify a crucial set of targets of the oncogenic transcription factor p63 that are common across multiple transcriptomic datasets obtained from HNSCC patients, and representative cell line models. Notably, our analysis revealed FST which encodes follistatin, a secreted glycoprotein that inhibits the transforming growth factor TGFβ/activin signaling pathways, to be a direct transcriptional target of p63. In addition, we found that FST expression is also driven by epidermal growth factor receptor EGFR signaling, thus mediating a functional link between the TGF-β and EGFR pathways. We show through loss- and gain-of-function studies that FST predominantly imparts a tumor-growth and migratory phenotype in HNSCC cells. Furthermore, analysis of single-cell RNA sequencing data from HNSCC patients unveiled cancer cells as the dominant source of FST within the tumor microenvironment and exposed a correlation between the expression of FST and its regulators with immune infiltrates. We propose FST as a prognostic biomarker for patient survival and a compelling candidate mediating the broad effects of p63 on the tumor and its associated microenvironment.


INTRODUCTION
Cancers deri v ed from the mucosal epithelia of the oral cavity, pharynx and larynx, collecti v ely known as head and neck squamous cell carcinoma (HNSCC), continue to contribute significantly to global cancer statistics ( 1 , 2 ). Risk factors for HNSCC include alcohol and / or tobacco consumption and human papillomavirus infection. The etiological, biological and clinical heterogeneities underlying this disease poses major challenges for effecti v e and targeted tr eatment. P atients afflicted with HNSCC are often not diagnosed until the disease is at an advanced stage, necessita ting multimodal trea tment plans, including surgery, r adiother apy and chemother ap y ( 3 ). Despite gr eat consideration for tumor localization and disease stage, current approaches for HNSCC yield relati v ely poor therapy outcomes, as they overlook the underl ying etiolo gy and molecular heterogeneity of this malignancy ( 4 ). Hence, the identification of prognostic biomarkers and drivers of HNSCC is critical for early disease detection and targeted treatments.
Large-scale genomic studies have begun to reveal the molecular complexity of HNSCC, including intra-and inter-tumor heterogeneities as well as shared gene expression profiles with other cancers of epithelial origin (5)(6)(7). These studies highlight the presence of tumor suppressor inactiva ting muta tions, activa ting onco genic m utations and mor e r ecentl y, onco genic fusion e v ents ( 5 , 8 , 9 ), howe v er it is likely that additional molecular e v ents further contribute to oncogenic processes. Hence, continued le v eraging of the genomic and genetic landscape of HNSCC might identify novel targets, and guide the development of more effective therapies. In this regar d, e xploration of the Transcription Factor (TF) networks as direct or indirect targets for therapeutic intervention, as well as the potential of specific TFs as biomarkers for better predicting and monitoring treatment response might be of use. One such pivotal candidate TF is tumor protein p63 (p63) which directs the squamous phenotype in HNSCC, a role that also extends to other epithelial cancers such as non-small cell lung cancer and a subset of aggressi v e pancreatic and bladder cancers (10)(11)(12)(13)(14)(15).
As the master regulator of epithelial de v elopment, p63 is responsible for the renewal and lineage commitment of stem cells that form epithelium-rich tissues and organs (16)(17)(18)(19). This deterministic role of p63, primarily dri v en by the Np63 isoform, also extends to squamous carcinomas, in which the actions of p63 are amplified through oncogenic signaling pathways that control a broad range of target genes (20)(21)(22)(23). The interaction of p63 with its cofactors and chromatin modifiers further expands its sphere of influence to practically e v ery hallmar k of cancer (24)(25)(26)(27)(28). Ne v ertheless, many aspects of p63 biology, including its full repertoire of do wnstream tar gets and their effect on HNSCC development and progression remain to be elucidated.
To uncover these targets, we performed an integrati v e patient-centered analysis using bioinformatics-dri v en data mining of the Cancer Genome Atlas (TCGA) dataset. When combined with a comprehensi v e RNA-seq and ChIPseq analysis of r epr esentati v e HNSCC cell lines, our analysis allowed us to generate a global p63 cistrome and a disease-relevant broad oncogenic gene signature. Specifically, we le v eraged p63-associated gene expression profiles of HNSCC tumors compared to normal tissue adjacent to the tumor and the super-enhancer landscape of HNSCC cell lines to identify and short list a number of known and novel oncogenes and tumor suppressors that are likely to be key mediators of p63 function. Our study highlights several genes in the EGFR and TGF-␤ signaling pathways as key constituents of the p63-gene signature that can stratify tumors from normal tissues in an independent cohort of HNSCC patients. We identified Follistatin (FST), a secreted glycoprotein and inhibitor of transforming growth factor TGF-␤/ Activin signaling pathways, as a p63 target in the epithelial component of the tumor micr oenvir onment (TME) that might play a role in directing stemness and metastasis in HNSCC. Interestingly, our analysis also revealed a hitherto unreported role for both p63 and the epidermal growth factor receptor (EGFR) signaling pathway in driving epithelial FST expression in HNSCC. Finally, we probed bulk and single cell RNA-seq HNSCC datasets to establish the cell-type specific enrichment of FST expression and discovered a possible link between the p63-EGFR-FST axis and the immune component of the HNSCC TME. Taken together, our studies offer new mechanistic insights into the p63 regulatory network that is operational in HN-SCC and identifies FST as a predicti v e biomar ker and an attracti v e potential candidate for therapeutic targeting in HNSCC.

Cell culture
SCC25 cells were grown in Dulbecco's modified eagle medium with nutrient mixture F-12 supplemented with 1% hydrocortisone. A253 cells (a gift from Jill Kramer, University a t Buf falo) were grown in McCoy's 5A modified medium. CAL27 cells, were obtained from ATCC and cultured in Dulbecco's modified Eagle's medium. All growth media were supplemented with 1% penicillin / streptomycin (100 IU / ml) and 10% fetal bovine serum and replaced every 48 h. Once the cultures reached 80% confluency, cells were detached with TrypLE (Gibco Life Technologies) and passaged. All cell lines used in the study were authenticated by short tandem repeat profiling and periodically tested for mycoplasma contamination.

Growth factor treatment and ERK inhibitor experiments
Cells were serum-starved for 24 h prior to the addition of growth factors. For initial experiments we used 50 ng each of epidermal growth factor (EGF, Cat: 236-EG), Activin A (Cat: 338-AC-010), Bone Morphogenic Protein (BMP7, Cat: 254-BP-010) and transforming growth factor beta 1 (TGF ␤1, Cat: 240-B-002) (all growth factors were obtained from R&D Systems). Subsequent experiments used lower concentrations of EGF as indicated in the figure legends. Cells were incubated at 37 • C for 2, 4 or 6 h with growth factors before they were processed for protein extraction. ERK1 / 2 inhibition was accomplished by treating cells grown in regular media with Selumetinib, ADZ6244 (Selleckchem) for 24 h. To block EGF-mediated ERK1 / 2 activation, cells wer e tr eated with ADZ6244 for 15-30 min prior to the addition of 10ng EGF, following which the cells were incubated for an additional 6hrs before collection and protein extraction.

Lentiviral mediated knockdown and o ver expr ession
Lentivirus-mediated depletion of p63 was performed using the pGIPZ system as previously described ( 29 ). pGIPZ lentiviral small hairpin RN A (shRN A#1; clone ID pV2LHS 201706; shRNA#2; clone ID pV2LHS 199988) targeting FST was obtained from gene modulation core facility at Roswell Park, Buffalo, NY. The lentiviral particles were packaged in HEK-293T cells by using a translentiviral shRNA packaging system and harvested from the cell supernatant 48 h later. Target cell lines were transduced with the virus in the presence of 4 g / ml Polybrene to aid transduction efficiency. Stably transduced target cells were selected using 2 g / ml puromy cin. Stab le transduction was also confirmed via gr een fluor escent protein (GFP) expression.
A plasmid harboring the full-length human FST transcript variant 344, and an in-frame 3 FLAG tag in the pcDNA3.1+ / C-(K)-DYK vector was obtained from Gen-Script. The FST cDNA was excised from its parent plasmid and subcloned into the EcoRI-Xho1 site in the pLEX vector (Open Biosystems) for stable expression of FLAGtagged FST in target cells. The FST transcript variant 344 encodes the 315 kDa FST isoform (FST 315) which is more relevant to our study due to its high expression in cancer cells.

Clonogenic assay
HNSCC cells were seeded in 6-well pla tes a t a density of 500 cells / well and maintained for 9 days at 37 • C and 5% CO 2 . The cells were fixed for 2 h in an acetic acid-methanol solution (1:3 [v ol / v ol]), washed twice with phospha te-buf fered saline (pH 7.0) and stained with 2% (wt / vol) methylene blue in 50% ethanol for 20 min. Excess stain was rinsed off with tap water, and the plates were air dried before imaging and cell counting.

Spheroid formation and migration analysis
HNSCC cells were seeded at a density of 5000 cells / well in a 96-well ultra-low attachment plate (catalog number 7007; Corning, Corning, NY). The cells were pelleted by centrifuging the plates at 1250 rpm for 3 min and then incubated for 24-72 h (37 • C and 5% CO 2 ) to obtain single self-organizing spheroids in each well. The spheroids were then transferred to 35-mm culture dishes or 24-well plates wher e they wer e imaged and measur ed (using ImageJ) and returned to the incubator. After 96 hours, the spheroids were imaged again and the area covered by the migrated cells was determined by dividing the area covered by migrated cells by the area covered by the original spheroid.

Invasion assay
Cell invasion assays were carried out using the Corning ® BioCoa t ™ Ma trigel ® Invasion Chamber system (Corning), according to manufacturer protocol. Briefly, a pproximatel y 50,000 cells in serum-free media were seeded into 24-well plate transwell inserts in triplicate. The inserts were then car efully lower ed into wells with media containing serum (as chemoattractant). The cells were allowed to incubate at 37 • c for 24 h for A253 cells and 48 h for SCC25 cells. Following this, the unmigrated cells were removed with moist cotton swabs, while the migrated cells wer e fix ed using 100% methanol and stained with 2% methylene blue. As an alternati v e approach, the fixation step was omitted and the GFP-signal from the cells was imaged using a fluorescent microscope.

RNA isolation and sequencing
Total RNA from cells was purified using a Direct-zol RNA Mini-prep kit (Zymo Research) according to the manufacturer's instructions. The purified RNA was snap-frozen on dry ice and stored at −80 • C before library preparation. cDNA libraries were prepared using the TruSeq RNA sample preparation kit (Illumina) and sequenced on an Illumina HiSeq 2500 sequencer. Quality control metrics of the raw sequencing r eads wer e determined using FASTQC v0.4.3. Reads were mapped to the reference human genome (GRCh38 / hg19 build) with HISAT v2.1.0 (30)(31)(32). Reads aligning to the r efer ence genome wer e quantified with fea-tureCounts v1.5.3 ( 33 ) to generate a matrix of counts, with columns corresponding to samples and rows containing the genes names. This was then imported into R (v 3.4) for further processing to generate normalized expression values in transcripts per million (TPM) values according to the method proposed by Wagner et al. ( 34 ). Differential gene expression analyses comparing control and test conditions were carried out using DESeq2 (v1.24.0) ( 35 ). Differentially expressed genes (DEGs) with a false-discovery rate value of ≤0.1 were considered statistically significant.

TCGA correlation analysis and derivation of gene signature
TPM-normalized values for both normal and tumor samples of the TCGA-HNSCC data set were retrie v ed from the Gene Expression Omnibus (GEO) w e bsite using the accession number GSE62944 ( 36 ). Pairwise correlation analysis was performed using the Hmisc package v5.0-1 in R, by comparing the expression of p63 across the normal and cancer datasets to e v ery other gene in the matrix. A subset ( n = 44) of the TCGA-HNSC cancer dataset for which matched normal gene expression samples were available was used for this analysis. Next, differential gene expression analysis was performed by comparing the 44 normal samples to the 502 cancer samples using DESeq2 v 1.36 ( 35 ). Significant genes were determined at an adjusted Pvalue of ≤0.1. The overlap between both analyses was determined by identifying genes that were either upregulated in cancer and showed positi v e correlation with p63 or were do wnregulated and sho wed negati v e correlation with p63 expr ession. The r esulting dataset was then subjected to a final round of filtering to identify genes that showed corresponding upregulation or downregulation following the depletion of p63 in HNSCC cell lines. The final 430-gene p63 signature was then used as input for unsupervised hierarchical clustering and stra tifica tion of an independent largescale HNSCC dataset generated by Huang et al. ( 37 ) ( http:// www.linkedomics.or g/data do wnload/CPTAC-HNSCC/ ).

HNSCC patient survival analysis
The R package RTCGA.clinical (v 3.5) was used to obtain the clinical data corresponding to HNSCC patients. The clinical data were merged with the expression data and ranked by decreasing expression of the genes of interest.
The 'surv cutpoint' function from the survminer (v0.4.9) R package ( 38 ) was used to determine the outcome-oriented optimal cut point between patients with high and low FST expression and to generate the survival plot.

ChIP-seq analysis
HNSCC cells were processed for chromatin immunoprecipitation sequencing (ChIP-seq) experiments as previously described ( 29 ). Briefly, cells were grown to 80% confluency in 150-mm culture dishes and cross-linked with 1% formaldehyde for 10 min. 200-500 bp chromatin fragments were generated by subjecting the cells to four cycles of sonication. Immunoprecipita tion of chroma tin complexes was accomplished using the following p63 antibodies: (2 g polyclonal antibody Np63 1.1 gift from Dr Borivoj Vojtesek ( 39 )); (1 g polyclonal antibody H129, Santa Cruz) and (2-3 g monoclonal antibody 4A4 ( 40 )). Sequencing libraries were generated from DNA purified from immunoprecipitated chromatin complexes using TruPlex DNA-seq kit from Rubicon. Libraries were sequenced on an illumina HiSeq 2500 machine. The sequencing reads from all experiments were mapped to the Homo sapiens genome (hg19 build) using Bowtie v1.1.1 ( 41 ) with the parameter m = 1 to remove all reads mapping to multiple genomic loci. Peak calling was then performed using MACS2 v2.0.10 ( 42 , 43 ). Irreproducib le discov ery rate (IDR) analysis was used to identify consensus regions across the different p63 experiments. The resulting ChIP-seq Peaks were annotated by mapping them to the nearest gene using default settings on the Genomic Region Enrichment Annotation Tool (GREAT) v4.04 ( 44 ). Visualization of ChIP-seq peaks was done using the Integrati v e Genomics Viewer (Broad Institute), and bigwig files used as input were generated from Chip-seq binary alignment module (bam) files using deeptools v3.3.2 ( 45 ).

Histone ChIP-seq clustering analysis
The fluff ( 46 ) heat map function was used to process enrichment signals from the histone ChIP-seq bam files. The signals were centered on p63 binding sites identified by IDR analysis, and clustered using k -means clustering. The resulting output matrix file containing the cluster assignments was then used as input for the fluff bandplot function to generate the three cluster profiles.

Micr oarra y analysis
The R Bioconductor package affy (v1.62.0) was used to download and normalize the microarray datasets GSE31056 ( 47 ) and GSE30784 ( 48 ) from the GEO. The normalized data were used as input to compare gene expression in normal and cancer tissues.

Single-cell sequencing analysis
Single-cell data from Puram et al. ( 49 ) were downloaded from GSE103322. The data were preprocessed using the Bioconductor Seurat package (v4.1.1) ( 50 ) as previously described ( 51 ) but with the following modifications. First, the entire data set, including data from nonepithelial cells, was used for analysis. Second, variance stabilization and da ta normaliza tion were perf ormed using the sctransf orm function ( 52 ) in Seurat package. Dimension reduction by principal component analysis (PCA), and uniform manifold approximation and projection (UMAP) clustering were performed as part of the workflow. Determination of cluster identity was performed using Bioconductor Sin-gleR package ( 53 ). Finally, the identity of each cluster was verified using the marker genes reported in the original study.

Gene set enrichment analysis
Gene set enrichment analysis was performed using either Metascape ( 54 ) or the w e bsite tool from the Broad Institute ( 55 ) as indicated in the figure legends. Briefly, the DEGs were grouped into upregulated and downregulated genes and separately supplied as queries to the gene set enrichment tool. The resulting annotation was then further processed in R or Affinity Designer to promote readability.

Pr otein-pr otein interaction network analysis
A pr otein-pr otein interaction network (Figure 3 C) was generated by using the 293-target signature gene list as the input for the STRING (v11.4) w e btool (available at stringdb.org). The resulting network was then ported into the Cytosca pe a pp ( 56 , 57 ) by using the automation pipeline plugin on the STRING w e bsite. Ne xt, the gene e xpression data consisting of 293-targets were mapped onto the network to generate a color code that reflects the changes in expression, and the network was rearranged using the 'yfiles tree layout' module to re v eal the underlying structure within the network.

Western blot analysis
W hole-cell lysa tes from cells grown to 80-90% confluency was pr epar ed in Laemmli sample buffer (Bio-Rad). The samples were heated to ∼100 • C for 10 min to denature the proteins before fractionation by SDS-polyacrylamide gel electr ophoresis. Pr oteins were then electr o-transferred to Imm unoblot pol yvinylidene difluoride membranes (Bio-Rad), which were incubated with the following antibodies:

Statistical analysis
Statistical analyses were performed either using R or Microsoft Excel (Microsoft Inc.). Data reported within this manuscript are presented as means ± standard deviations. All experiments were performed at least in triplicates, and statistical significance was determined using two-tailed Student's t tests.

p63 expr ession corr elates with a broad epithelial and differentiation program in HNSCC
To define the transcriptional network associated with p63 in HNSCC ( Figure S1A in the supplemental material), we first identified genes that significantly correlate with p63 expression by using the TPM-normalized TCGA HNSCC data set from GSE62944 ( 36 ). As a first step, we generated pairwise correlation scores for the expression of each gene in relation to p63 expression in cancer and normal tissues using data from the 44 patients with matched normal samples. The inclusion of gene expression values from the normal adjacent tissues, enabled us to set a normal baseline for p63 expression in estimating correlation scores for all other genes. Of the 22565 genes profiled, a large percentage (73%) correlated positi v ely with p63 (Supplementary Table S1), suggesting that p63 acts primarily as a transcriptional activator.
We next compared the transcriptomes of the 44 normal adjacent tissue samples to those of the entire set of 504 HN-SCC samples to identify the p63-correlated genes that are differ entially expr essed between normal and cancer tissues. This analysis identified 14912 differentially expressed genes (DEGs) at a false discovery rate of ≤10% (adjusted P value ≤ 0.1). A comparison of the log2 fold change values of the DEGs with their correlation scores identified 11970 genes that changed in the same direction as that predicted by the p63 corr elation scor es, i.e. the genes wer e either upr egulated and had positi v e corr elation scor es or downr egulated and had negati v e corr elation scor es (Supplementary Table S1).
We reasoned that these correlated DEGs are likely to be involved in p63-regulated tumorigenic processes in HN-SCC. Accordingly, gene ontology (GO) analysis of the top 1500 positi v ely correlated DEGs showed an enrichment in biological processes such as proliferation, cell division and DNA replication (Supplementary Figure S1B). Conversely, the genes that wer e downr egula ted and nega ti v ely correlated with p63 were enriched in dif ferentia tion processes such as kera tiniza tion, osteogenesis, myogenesis and or ganelle or ganization (Supplementary Figure S1C). These r esults ar e consistent with the established role of p63 in the maintenance of stem cells and suppression of dif ferentia tion in HNSCC ( 24 , 28 ). To further explore the biological relevance of these genes, we performed pathway analysis which re v ealed that the positi v ely corr elated DEGs wer e associated with mRN A processing, DN A damage response and TGF-␤ pathways (Supplementary Figure S1D), whereas the negati v ely correlated DEGs were associated with various metabolic and functional pathways, such as lipid and fatty acid metabolism and muscle contraction pathways (Supplementary Figure S1E).
Gi v en the large number of DEGS identified in both screens, we suspected that our list may r epr esent both bonafide targets of p63, as well as genes that are simply a reflection of the profound variation inherent in the tumor samples.

Integrative analysis of HNSCC patient and cell line datasets defines a p63 signature
To evaluate the transcriptomic events specifically regulated by p63, we used lentiviral-based delivery of shRNA to de-plete p63 expression in two well-established HNSCC cell lines, A253 and SCC25 (58)(59)(60)(61). We confirmed diminished p63 expression by two independent shRNAs via western blot analysis in both cell lines (Figure 1 A), and performed RNA-sequencing to determine the global changes in gene expression. We identified ∼8200 differentially expressed genes (DEGs) in A253 cells and 2700 DEGs in SCC25 at a false discovery rate (FDR) of 10% (Supplementary Figure S2A, B). A GO analysis of these DEGs indicated that p63 may promote stemness in cells by r epr essing genes involved in cellular dif ferentia tion (Supplementary Figure  S2C, D) while favoring proliferati v e processes such as transcription, growth factor signaling and energy production (Supplementary Figure S2E, F). This result is well in agreement with the known p63 oncogenic program in SCC cells ( 24 , 28 ).
To identify a consensus p63-mediated transcriptome, we determined the DEGs common to both A253 and SCC25 cells (Figure 1  Gi v en the concor dance of the p63 regulated processes generated from our in-vitro cell line models, we proceeded to integrate the results from the cell-line based analysis (1350 genes) with the results from the HNSCC patient analysis (11 790 genes) to identify a common p63 gene signature. We found that 430 genes tha t correla ted with p63 expression across normal and cancer tissues, were also dysregulated following the loss of p63 expression in both A253 and SCC25 cells (Figure 1 F & Supplementary Table S1).
To determine whether this p63 consensus gene signature can be used to discriminate between normal adjacent tissues and tumors, we performed unsupervised hierarchical clustering on a combined matrix of the data from 44 HN-SCC and normal adjacent tissues as well as an independent human papillomavirus HPV-negati v e HNSCC data set generated by Huang et al. ( 37 ) from 53 normal adjacent tissues and 109 tumor samples. Our p63 consensus gene signatur e segr egated most of the cancer and normal adjacent tissue samples in both datasets (Figure 1 F), suggesting that these genes are critical dri v ers of the oncogenic processes in HNSCC. Notabl y, a GO anal ysis showed that the p63-dri v en consensus gene signature is enriched for biological processes such as migra tion, dif ferentia tion and signal transduction processes (Figure 1 G).

Global genomic occupancy by p63 in SCC cells identifies dir ect tr anscriptional targets
To establish a mechanistic basis for the observed changes in gene expression driven by p63, we next performed ChIPseq experiments in both A253 and SCC25 cells using p63specific antibodies. Pairwise correlation analyses on the replicates for each cell line re v ealed good agreement in both the number and quality of p63 binding sites identified by two p63 antibodies (Supplementary Figure S3A). We also used the irreproducib le discov ery rate to measure the consistency between ChIP-seq replicates and identified 13800 and 37516 sites in A253 and SCC25 cells, respecti v ely (Supplementary Table S2).
A de novo motif analysis using the MEME-ChIP suite ( 62 , 63 ) confirmed the core p63 motif was the most enriched in both cell lines (Supplementary Figure S3B). We used the TOMTOM program ( 64 ) to compare the de novo analysis motif against the human transcription factor database HO-COMOCO ( 65 ), which confirmed that our enriched core motif is identical to the p63 motif ( P = 4.5e-360). In addition to the core motif, this analysis also identified several other enriched TF motifs, including those for the AP-1, KLF and FOX family (Supplementary Figure S3B), suggesting that these TFs may act as transcriptional cofactors of p63. We also used the ChIPseeker program ( 66 ) to evaluate the genomic distribution of p63 peaks relati v e to transcriptional start sites; p63 binding sites were enriched at distal and intragenic regions (Supplementary Figure S3C), which is consistent with the known predilection of p63 to target enhancers ( 67 ).
To better understand the p63 regulatory landscape, we perf ormed f ollow-up ChIP-sequencing analysis with both cell lines for three epigenomic marks: acetylation of lysine 27 on histone 3 (H3K27ac), which marks active regulatory regions, and mono-methylation and trimethylation of lysine 4 on histone 3 (H3K4me1 and H3K4me3, respecti v el y), w hich mark enhancer and promoter targets, respecti v ely ( 68 ). K-means clustering of normalized le v els of histone modifica tions a t each p63-bound region indica ted tha t a pproximatel y 7.9% and 5.6% of p63 binding sites were associated with the promoter H3K4me3 mark in A253 and SCC25 cells, respecti v ely (Supplementary Figure S3D). Surprisingly, less than 23% of the enhancer-associated p63 binding sites were marked by robust H3K27ac. By contrast, the largest proportion of p63 binding was associated with poised enhancers, with strong H3K4me1 marking but weak H3K27ac marking (Supplementary Figure S3D).
GO annotation of the genes associated with the regulatory regions, as determined via GREAT ( 44 ), re v ealed that the different clusters are associated with genes involved in distinct biological processes (Supplementary Figure S3E). The cluster r epr esenting the poised enhancers was associated with genes involved in various differentiation processes such as br anching morphogenesis, ker a tinocyte dif ferentiation and cornification. The cluster r epr esenting acti v e promoters (strong marking for H3K27ac and H3K4me3) was associated with genes involved in de v elopmental processes, such as epidermis de v elopment, e xtracellular matrix organization, wound healing and apoptosis. The cluster r epr esenting acti v e enhancers (strong marking for H3K27ac and H3K4me1) was associated with genes involved in the cell cycle and cell-cell / matrix adhesion processes. Together, these results point to an oncogenic role for p63 in which it dri v es the cellular proliferation program at the expense of differentiation.

P63 occupies super enhancers more frequently than regular enhancers in HNSCC
The presence of an acti v e enhancer cluster prompted us to evaluate the possibility that p63 binds super enhancers, which are large clusters of enhancers that control the expression of cell identity genes and are enriched for lineagespecifying and oncogenic transcription factors, such as p63 ( 69 , 70 ). We mapped the super enhancer landscape by implementing the ROSE algorithm ( 70 , 71 ) on H3K27ac ChIPseq data from both HNSCC cell lines and identified 642 super enhancers in A253 cells, and 963 super enhancers in SCC25 cells which is in agreement with our prior results ( 60 ) (Figure 2 A). We identified the associated genes by mapping each super enhancer to their nearest genes using GREAT.
Interestingly, most of the super enhancers were bound by p63 in both cells, while only small proportion of the regular enhancer r egions wer e associated with p63 binding (Figure  2 B). Ther e wer e 653 genes associated with super enhancers that were common to both cancer cell lines (Figure 2 C), including oncogenes known to regulate aspects of HNSCC biology. For example, TP63 , EGFR and FOSL1 , known regulators of stemness and malignancy (72)(73)(74), were associated with super enhancers encompassing multiple p63 binding sites in HNSCC (Figure 2 D). These results provide further evidence for a role for p63 in driving an oncogenic gene expression program in HNSCC.

P63 directs a highly interconnected EGFR-TGF-␤1 network in HNSCC
We next sought to determine if the super enhancerassociated genes were part of the p63 consensus signature that was identified. We mapped the p63 binding sites from both A253 and SCC25 cells to their putati v e targets using the default (basal plus extension) setting for the GREAT and identified 8824 targets common to both cell lines. A large proportion (68% [291 / 430]) of the genes in the p63 consensus signature were associated with a p63 binding site (Figure 3 A). We therefore posited that there may be functional interactions among the genes in this target signature. To test this, we queried for a pr otein-pr otein interaction network and performed functional enrichment analysis using the STRING database ( 75-77 ) (Figure 3 B). This re v ealed an e xtensi v e interconnected networ k of predicted interactions involving 61 of the 291 targets genes in the p63 consensus signature; the network included key molecules in the epidermal growth factor receptor (EGFR) and TGF-␤1 signaling pathways (Figure 3 C).
The expression pattern of genes in the network was consistent with that in the independent data set generated by Huang et al. ( 37 ) (Figure 4 A), suggesting that these genes may be critical for HNSCC biology. In a parallel analysis, we overlaid the 653 super enhancer associated genes with the genes present in our newly identified EGFR-TGF-␤1 network signature and identified a subset of 17 genes that are common to both lists (Figure 4 B and highlighted in red in Figure 4 A). Among the genes in this subset were several stem cell markers, such as ITGB4 , HMGA2 , IGF2BP2 , COL17A1 and FST (78)(79)(80)(81). To further validate the functional importance of these genes, we examined their expression in two separ ate HNSCC microarr ay datasets ( 47 , 48 ), which re v ealed ITGB4 , FST , GNAI12 and EFNB1 to be among the genes showing significant upregulation in both datasets examined (Figure 4 C & D).

Follistatin is a novel target of p63 in HNSCC
Gi v en the consistent ov ere xpression of FST across multiple HNSCC datasets, and the enrichment of the TGF-␤ network in our analysis, we reasoned that FST may play an important role in the oncogenesis of HNSCC. FST is a secreted glycoprotein that binds and neutralizes the ligands of the TGF-␤ superfamily, and has been shown to play a role in regulating the de v elopment, progression and metastatic dissemination of epithelial cancers (82)(83)(84). Although a role for the p63-FST axis in the regulation of Activin signaling has been shown in mouse salivary glands ( 81 ), a relationship between p63 and FST has not been established in HNSCC.
To confirm whether the p63-dri v en regulation of FST is conserved in HNSCC, we evaluated FST expression in control and p63-depleted A253 and SCC25 cells. The results showed the loss of FST expression following p63 depletion in both cell lines (Supplementary Figure S4A), in agreement with the RNA-seq results that re v ealed FST to be a likely transcriptional target of p63 (Figure 3 C). To probe this further, we examined the FST genomic locus using our ChIPseq data and identified se v eral p63 binding sites within a super enhancer cluster located ∼300 kb from the FST gene (Supplementary Figure S4B). Taken together, these results identify p63 as an upstream oncogenic dri v er of FST e xpression in HNSCC.

P63 mediates EGFR signaling to drive FST expression in HNSCC
Previous studies have reported that ovarian cancer cells increase FST production in response to TGF-␤ stimulation ( 84 ), and FST expression has been shown to coincide with the activation of the signaling cascades downstream of EGF signaling in leukemia cells ( 85 ), howe v er, signaling e v ents driving FST in HNSCC cells remain unknown. Gi v en the enrichment of the EGFR-T GF-␤ netw ork in our dataset, we hypothesized that HNSCC cells might regulate FST expression in response to both signaling pathways. To test this hypothesis, we first determined the specific TGF-␤ superfamily ligands that ar e expr essed in HNSCC tissues using the TCGA dataset. Our analysis re v ealed TGF-β1 , acti vin A ( INHBA ) and BMP7 to be the top three highly expressed ligands in HNSCC (Supplementary Figure S5). We thus treated A253 and SCC25 cells with TGF-␤1, activin, BMP7 or EGF for 4 h and evaluated the expression of FST. Our Western blot results showed that of all the growth factors tested, only EGF treatment led to a consistent increase in FST protein le v els after four hours (Figure 5 A). While these results do not rule out the contribution of other cytokines to the regulation of FST, they suggest that the EGF signaling dominates the early phase of FST regulation.
EGF signaling can be mediated by multiple downstream signaling cascades, including the extracellular signaling  ( 48 ), and in ( D ) the HNSCC microarray data set reported by Reis et al. ( 47 ). Statistical significance was determined using unpaired Student's t tests: * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001. regulated kinases (ERK1 / 2), phosphatidyl-inositol 3 kinase (PI3K-AKT) or signal transducers and activators of signaling (STAT). Activation of any of these downstream molecules via phosphorylation could potentially dri v e the expression of FST. Therefore, to determine which of these downstream molecules dri v e FST e xpression, we e xamined their phosphoryla tion sta tus following trea tment with EGF by western blotting. As shown in Figure 5 B , trea tment of both A253 and SCC25 cells with EGF led to increased phosphorylation of ERK1 / 2, while AKT and STAT pathways did not show significant acti vation. Ne xt, we e valuated the contribution of p63 to the EGF-mediated FST expression by treating both control and p63-depleted cells with EGF. Compared to control cells, p63-depleted cells showed no FST expression following EGF treatment, confirming a central role for p63 as a downstream effector of the EGFmedia ted FST regula tion (Figur e 5 C). These findings wer e verified in a third HNSCC cell line: CAL27 cells, where EGF tr eatment r esulted in the phosphorylation of ERK1 / 2 as well as AKT and STAT3 (Supplementary Figure S7A). Importantly, p63 depletion also blocked the EGF-mediated induction of FST in CAL27 (Supplementary Figure S7B). To independently confirm a role for ERK activation in mediating FST expression, we treated A253 and SCC25 cells with Selumetinib (ADZ6244), a selecti v e inhibitor of ERK phosphorylation. Western blot analysis revealed a dose dependent decrease of FST expression following ERK inhibition (Figure 5 D). Furthermore, ADZ6244 blocked the EGF media ted FST upregula tion in both cells (Figure 5 E). Altogether, these results indicate a central role for the p63-EGF-ERK1 / 2 signaling axis in driving FST expression in HNSCC.

FST r egulates prolifer ation and colony f ormation in HNSCC cells
FST has been associated with stem and progenitor cell function in tissue de v elopment ( 81 , 86 , 87 ), prompting us to consider a similar role in the oncogenic context. To investigate this, we tested two FST targeting shRNA lentiviral constructs, on both cells to examine their efficacy in depleting FST expression. Western blot analysis confirmed both shRNA constructs to be effecti v e, with shFST #2 showing more robust depletion of FST (Figure 6 A). Following the confirmation of FST depletion, we evaluated the effect of FST depletion on cell proliferation, by performing a cell proliferation assay, which showed that the loss of FST resulted in reduced proliferation in both A253 and SCC25 cells (Supplementary Figure S6A, B). Next, to determine if the loss of cell proliferation resulted in the loss of stemness, we investigated the ability of these cells to form colonies. Compared to control cells, FST depleted cells were deficient in their ability to form colonies (Figure 6 B). In contrast, v ector-mediated ov ere xpression of FST (Figure 6 C) enhanced colon y f ormation by A253 cells but suppressed the same in SCC25 cells (Figure 6 D). Together, these analyses suggest that FST regulates proliferation and colon y f orma tion in HNSCC .

Effect of FST on invasion and migration of cancer cells
Gi v en the role of FST as an inhibitor of TGF-␤ signaling, it is possible that it may influence the invasion and migration of HNSCC cells. To investigate the effect of the loss of FST on cell invasion, we subjected both A253 and SCC25 cells to matrix in vasion assa ys using the transwell system, which re v ealed tha t the ef fect of the loss of FST is consistent with promoting invasion in cancer cells (Supplementary Figure S6C, D), although the results were not statistically significant in SCC25 cells. We reasoned that the effect of FST may be more pronounced in a system that closely mimics the 3D ar chitectur e of cancer cells in-vivo. To investigate this, we first grew HNSCC cells for 72 h on ultra-low attachment plates which allow them to self-organize into spheroids. The spheroids were then transferred into individual wells of a 24-well pla te, to facilita te a ttachment and migration to form monolayers (Figure 7 A). This system enabled us to assess migration by calculating the distance covered by the migrating cells, relati v e to the initial ar ea cover ed by the spheroids ( 88 , 89 ). FST-depleted A253 cells showed greater migration than controls, whereas FSTov ere xpressing A253 cells showed reduced migration (Figure 7 B). Howe v er, FST depletion reduced the migration of SCC25 cells, while FST ov ere xpression had no significant effect on the migration of SCC25 cells (Figure 7 C). Parallel studies were performed in a third HNSCC cell line, CAL27 cells, in which FST knockdown and ov ere xpression (Figure S8A, B) produced results similar to those obtained with A253 cells (Figure 7 D). Taken together, our analysis re v eals that FST has a cell-intrinsic role in regulating the migration of SCC cells.

FST expression in the tumor micr oenvir onment is primarily restricted to epithelial cells
Our analysis thus far has focused on the expression of FST by epithelial cells; howe v er, FST could also be e xpressed by the myriad cell types that make up the tumor micr oenvir onment. We reanalyzed the single-cell RNA-seq data set generated from primary HNSCC tissues by Puram et al. ( 49 ) to annotate the identities of the various cells within the tumor micr oenvir onment according to markers reported for the original manuscript. We identified T cells ( CD2 and CD3D ), fibroblasts ( MMP2 and ACTA2 ), macrophages ( CD14 and FCGR2A ), dendritic cells ( CD83 and CD80 ), mast cells ( CMA1 and MS4A2 ), endothelial cells ( PECAM1 and VWF ), B / plasma cells ( BLNK and SLAMF7 ), myocytes ( ACTA1 and MYL2 ) and epithelial / cancer cells ( EPCAM and KRT14 ) (Figures 8 A and S9). We then evaluated the expression profile of FST across the different cellular clusters and found that FST transcripts were predominantly localized to the epithelial subclusters, with sparse expression in cancer-associated fibroblasts (Figure 8 B). As expected, TP63 and EGFR expression were also prominently enriched in the epithelial clusters (Figure 8 C, D). Taken together, we conclude that epithelial cells express both p63 and EGFR and are the major source of FST within the HNSCC micr oenvir onment.

FST expr ession corr elates with the immune micr oenvir onment in HNSCC
The presence of intr a-tumor al immune cell infiltrates, particularly tumor-suppressi v e T lymphocytes, is associated with a better prognosis in HNSCC ( 90 , 91 ). Furthermore, EGF signaling can block the release of immunomodulatory cytokines that promote T cell infiltration in HNSCC ( 92 ), w hereas pharmacolo gical inhibition of EGFR promotes the infiltration of cytotoxic (CD8 + ) T cells in a murine model of HNSCC ( 93 ). Thus, EGF signaling may impede T-cell infiltration in HNSCC tumors and lead to poor patient outcomes. Gi v en our findings that FST is downstream of EGF signaling, we reasoned that there may also be a correlation between FST expression and T-cell infiltration. To investigate this, we generated gene-immune cell association plots using TIMER 2.0 ( 94 ), which re v ealed that the expression of FST , TP63 and EGFR correlated negati v ely with the infiltration of tumor-suppressi v e CD8 + T lymphocytes (Figure 9 A). Conversely, we found that all three genes correlated positi v ely with the infiltration of tumor-promoting myeloidderi v ed suppressor cells (Figure 9 B).
On the basis of the above-described correlations, we also reasoned that FST-enriched epithelial cancer cells may influence the immune tumor micr oenvir onment and thus affect the outcomes of patients with HNSCC. To investigate this, we stratified the TCGA-HNSCC patients into two cohorts, consisting of 398 FST-high patients, and 74 FST-low pa tients and investiga ted the prognostic value of FST by Ka plan-Meier anal ysis. As shown in Figure 9 C, high FST expr ession corr ela ted nega ti v ely with patient survi val, indicating a pro-tumorigenic role of FST. Collecti v ely these data lend credence to the idea that EGFR-p63-FST signaling in the complex tumor micr oenvir onment of HNSCC modulates the immune response.

DISCUSSION
Our limited understanding of the molecular etiology and complexity of HNSCC remains a barrier to effecti v e treatment. In particular, there is a need to identify molecular biomarkers and crucial drivers of HNSCC, which can be exploited to predict disease progr ession, r esponse to tr eatment and importantly, re v eal ne w therapeutic targets. Gi v en the pivotal role of p63 in the development, progression, and metasta tic dissemina tion of various SCCs, including HN-SCC, the identification and characterization of the criti-cal downstream network of targets that mediate the oncogenic function of p63 remain a priority that is likely to yield valuable insights. We integrated transcriptomic data from TCGA-HNSCC tumors and HNSCC cell lines to define a conserved and functionally relevant p63 target gene signature. Our analyses re v ealed FST as a direct target of p63 and a novel marker of HNSCC oncogenesis.
Large-scale transcriptomic datasets from HNSCC patients can be le v eraged to better prioritize actionable p63 target genes in HNSCC. We evaluated p63 le v els and associated gene expression patterns between the tumor and adjacent normal tissues, which re v ealed cancer-related genes whose expression correlated with that of p63 and thus suggested a possible functional link. The use of HNSCC cell lines deri v ed from patients enabled us to uncover the epigenetic landscape of p63-bound regulatory elements, particularly those that r epr esent acti v e enhancers. We posit that the H3K27Ac-enriched super enhancers we identified are associated with genes that are likely to occupy higher positions in the p63 oncogenic networ k. Notab ly, this list of super enhancer-associated genes consists of se v eral known and emerging oncogenic factors such as SREBF1 , which encodes a transcription factor tha t regula tes tumor metabolism and drug resistance ( 95 , 96 ), and FSCN1 and HAS3 , two oncogenic factors that promote the growth and metastasis of tumor cells ( 58 , 97 ). We also identified genes that are anti-correlated to p63 expression, including putati v e tumor suppressors such as EHF , which we and others have shown to be an important regulator of tumor redox homeostasis and the immune environment ( 51 , 98 ), and CXXC5 , an unmethylated CpG binder that likely acts as a nucleation factor for transcription factors, coregulatory proteins and DNA modifiers ( 99 ).
An interesting subset of the 61 genes within the p63 signature we identified was associated with an integrated EGFR-T GF-␤ netw ork. Moreo ver, these p63-target genes showed a distinct cancer-enriched expression pattern compared to that of normal tissue in an independent primary HNSCC data set, suggesting they r epr esent a core oncogenic signature of HNSCC biology ( 37 ). The functional link between p63 and TGF-␤ and other signaling pathways is exemplified by studies showing that the downregulation of p63 is r equir ed for TGF-␤-induced metastasis in SCC ( 100 ), and that the loss of p63 during HNSCC de v elopment dri v es tumor metastasis via mitogen-activated protein kinase signaling ( 101 ). The cooperation between EGFR and TGF-␤1 signaling is also an important dri v er of metastasis, as shown in breast cancer cells ( 102 , 103 ). We posit that this cooperation is similarly involved in the recently described EGF-mediated epithelial-to-mesenchymal transition in HNSCC ( 89 ). Our findings suggest that an EGFR-TGF-␤ signaling axis involving p63 dri v es HNSCC onco-genesis and that this function could partly be mediated by FST.
We uncover several interesting aspects of FST biology and function in HNSCC: first, we identified FST as super enhancer-associated gene in HNSCC whose expression is consistently downregulated in the absence of p63; second, we demonstrated that FST intrinsically mediates HNSCC cell clonogenicity and migration, presumably by neutralizing TGF-␤/ activin signaling. These observations agree with previous studies showing that depletion of activin A stops the migration of oral cancer cells ( 104 ). We postula te tha t the loss of p63 relie v es the inhibition of TGF-␤/ activin signaling via FST thereby promoting tumor epithelial cell migration and metastasis, and these processes are intertwined with cross-talk between cancer cells and other key components in the TME such as fibroblasts, as in the case of prostate cancer ( 105 ). Taken together, these observations and findings offer a hitherto unappreciated pathway in which EGF-dri v en regulation of FST expression is mediated by p63 in HNSCC.
We found that the expression of EGFR , TP63 and FST correlates with immune infiltration in the tumor microen- vironment, such that their expression is linked not only with the absence of tumor-eliminating T lymphocytes but also with the presence of tumor-promoting myeloid-deri v ed immune-suppr essor cells. These r esults agr ee with published reports implicating tumoral EGF signaling as a major suppressor of T cell infiltration ( 92 ). Interestingly, FST dri v es resistance to immune checkpoint inhibitors in ovarian cancer cells ( 84 ), but futur e r esear ch is needed to determine whether similar mechanisms are at play in HNSCC. Nevertheless, FST r epr esents a novel modulator of the complex EGFR-TGF-␤-p63 axis and the tumor micr oenvir onment in HNSCC. Our findings also highlight FST as a potential prognostic factor in HNSCC and that its negati v e effect on disease progression may result from a possible metastatic suppressor function as shown in breast cancer ( 82 ).
One limitation of our study is that we did not consider the differences in the mutational landscape or inherent molecular subtypes of the r epr esentati v e HNSCC cells. For example, A253 and CAL27 cells harbor mutations in the TGFBR2 and TGFBR1 genes respecti v el y, w hereas SCC25 cells express wildtype receptors -such differences may explain some of the discrepant functional outcomes that we observed in our studies ( 106 , 107 ). Future research efforts will involve a systematic and tumor subtype-specific approach to evaluating the role of FST in HNSCC. Such studies will re v eal specific vulnerabilities of tumors that may lead to the de v elopment and adoption of FST b locking antibodies in the routine clinical care of HNSCC patients.

DA T A A V AILABILITY
All sequencing data generated in this study have been deposited in the Gene Expression Omnibus (GEO) under accession number GSE212752.

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Cancer Online.

ACKNOWLEDGEMENTS
We thank Jonathan Bard and Brandon Marzullo for bioinformatics support and the Uni v ersity a t Buf falo Genomics and Bioinformatics Core for the next-generation sequencing services. We thank the Roswell Park Gene Modulation Service Shar ed Resour ce for help with the generation of lentiviral constructs as well as the Uni v ersity at Buffalo Center for Computation Research (UB CCR) for the provision of computational r esour ces used in our analysis. We also thank Dr Karen Dietz for help with editing this manuscript. Conflict of interest statement. The authors declare no potential conflicts of interest with respect to the authorship and / or publication this article.